Search CORE

13 research outputs found

The Architecture of the XtreemOS Grid Checkpointing Service

Author: Mehnert-Spahn John
Morin Christine
Ropars Thomas
Schoettner Michael
Publication venue: HAL CCSD
Publication date: 01/01/2008
Field of study

The EU-funded XtreemOS project implements a grid operating system (OS) transparently exploiting distributed resources through the SAGA and POSIX interfaces. XtreemOS uses an integrated grid checkpointing service (XtreemGCP) for implementing migration and fault tolerance. Checkpointing and restarting applications in a grid requires saving and restoring applications in a distributed heterogeneous environment. The latter may spawn millions of grid nodes using different system-specific checkpointers saving and restoring application and kernel data structures on a grid node. In this paper we present the architecture of the XtreemGCP service integrating existing checkpointing solutions. Our architecture is open to support different checkpointing strategies that can be adapted according to evolving failure situations or changing application requirements. We propose to bridge the gap between grid semantics and system-specific checkpointers by introducing a common kernel checkpointer API that allows using different checkpointers in a uniform way. Furthermore, we discuss other grid related checkpointing issues including resource conflicts during restart, security, and checkpoint file management. Although this paper presents a solution within the XtreemOS context it can be applied to any other grid middleware or distributed OS, too

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Independent Checkpointing in a Heterogeneous Grid Environment

Author: Feller Eugen
Mehnert-Spahn John
Morin Christine
Schoettner Michael
Publication venue: HAL CCSD
Publication date: 27/09/2010
Field of study

The EU-funded XtreemOS project implements an open-source grid operating system based on Linux. In order to provide fault tolerance and migration for grid applications, it integrates a distributed grid-checkpointing service called XtreemGCP. This service is designed to support different checkpointing protocols and to address the underlying grid-node checkpointers (e.g. BLCR, LinuxSSI, OpenVZ, etc.) in a transparent manner through a uniform interface. In this paper, we present the integration of an independent checkpointing and rollback-recovery protocol into the XtreemGCP. The solution we propose is not checkpointer bound and thus can be transparently used on top of any grid-node checkpointer. To evaluate the prototype we run it within a heterogeneous environment composed of single-PC nodes and a Single System Image (SSI) cluster. The experimental results demonstrate the capability of the XtreemGCP service to integrate different checkpointing protocols and independently checkpoint a distributed application within a heterogeneous grid environment. Moreover, the performance evaluation also shows that our solution outperforms the existing coordinated checkpointing protocol in terms of scalability.Le projet XtreemOS financé par l'Union Européenne met en oeuvre un système d'exploitation open-source pour grille basé sur Linux. Afin d'offrir tolérance aux fautes et migration d'applications pour grilles, il intéragit avec un service distribué de sauvegarde de points de reprise de processus appelé XtreemGCP. Ce service est conçu pour supporter différents protocoles de sauvegarde de points de reprise de processus et pour s'interfacer avec les systèmes de sauvegarde de points de reprise sous-jacents (par exemple BLCR, LinuxSSI, OpenVZ, etc.) de manière transparente à travers une interface uniforme. Dans cet article, nous présentons l'intégration d'un protocole indépendant de sauvegarde de points de reprise et de retour arrière dans XtreemGCP. La solution que nous proposons n'est pas limitée par le système de sauvegarde de points de reprise et peut ainsi être utilisée de façon transparente au-dessus de n'importe lequel. Nous évaluons ce prototype en l'exécutant dans un environnement hétérogène composé de simples noeuds PC et d'une grappe basée sur un système à image unique (SSI). Les résultats expérimentaux démontrent la capacité du service XtreemGCP à intégrer les différents protocoles de sauvegarde de points de reprise et à sauvegarder de manière indépendante un point de reprise d'une application distribuée s'exécutant sur un environnement de grille hétérogène. De plus, les évaluations de performance montrent que notre solution surpasse les protocoles coordonnés existants en terme de passage à l'échelle

INRIA a CCSD electronic archive server

Massively Multiuser Virtual Environments using Object Based Sharing

Author: Möller Kim-Thomas
Müller Marc-Florian
Schoettner Michael
Schulthess Peter
Sonnenfroh Michael
Publication venue: European Association of Software Science and Technology
Publication date: 27/02/2009
Field of study

Massively multiuser virtual environments (MMVEs) are becoming increasingly popular with millions of users. Commercial implementations typically rely on a traditional client/server architecture controlling the virtual world state of shared data at a central point. Message passing mechanisms are used to communicate state changes to the clients. For scalability reasons our approach creates and deploys MMVEs in a peer-to-peer (P2P) fashion. We use standard Java technology implementing only a few basic data-centric operations for the management of our distributed objects. Higher consistency models can easily be implemented using these basic operations. Currently, we have implemented transactional consistency offering convenient and consistent access to the shared scene graph. In this paper we describe our basic object model and the prototype implementation TGOS (Typed Grid Object Sharing). Furthermore, we discuss preliminary measurements with the virtual world Wissenheim executed on top of TGOS

Electronic Communications of the EASST (European Association of Software Science and Technology)

Distributed Architecture for a Peer-to-Peer-Based Virtual Microscope

Author: Filler Timm,
Jaegermann Andreas
Schoettner Michael
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/06/2013
Field of study

Part 2: Work-in-Progress PapersInternational audienceVirtual microscopes are commonly used in medical education. They provide a platform for distributing whole slide images (WSI) with several GB size to exploring students. Even in courses with a few hundred students and dozens of WSI the network traffic may be high, but it will vastly increase, when the system is opened to access from the Internet. The same applies to user-generated content like interactive annotations (each student generates approx. 200 labels per term). In a collection that consists of several thousand WSI, which need to be annotated for training or quiz-based purposes, there will be millions of user contributions. In an abstract view users navigate through a universe of WSI and annotations and may meet other users watching the same or related WSI. This paper presents a distributed architecture build on PathFinder for Internet-based virtual microscopy addressing the challenges of distributing tightly connected data chunks on an overlay network consisting of random graphs

Compiler Support for Reference Tracking in a Type-Safe DSM

Author: Michael Schoettner
Peter Schulthess
Ralph Goeckelmann
Stefan Frenz
Publication venue
Publication date: 01/01/2003
Field of study

The efficiency of language implementations is heavily influenced by the selected strategy for allocation and reclaim of memory. Memory allocation in a distributed shared memory (DSM) cluster poses additional challenges

CiteSeerX

Crossref

Checkpointing Process Groups in a Grid Environment

Author: Mehnert-Spahn John
Morin Christine
Schoettner Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

International audienceThe EU-funded XtreemOS project implements a grid operating system transparently exploiting resources of virtual organizations through the standard POSIX interface. Grid checkpointing and restart requires to save and restore jobs executing in a distributed heterogeneous grid environment. The latter may spawn millions of grid nodes ( PCs, clusters, and mobile devices ) using different system-specific checkpointers saving and restoring application and kernel data structures for processes executing on a grid node. In this paper we shortly describe the XtreemOS grid checkpointing architecture and how we bridge the gap between the abstract grid and the system-specific checkpointers. Then we discuss how we keep track of processes and how different process grouping techniques are managed to ensure that all processes of a job and any further dependent ones can be checkpointed and restarted. Finally, we present how Linux control groups can be used to address resource isolation issues during the restart

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1